Global climatological data of ocean thermohaline parameters derived from WOA18

This is a global ocean climatological dataset of 17 thermohaline parameters such as isothermal layer (ITL) depth (hT), mixed layer (ML) depth (hD), thermocline gradient (GT), pycnocline gradient (GD) determined from temperature (T) and salinity (S) profiles of the National Centers for Environmental Information (NCEI) world ocean atlas 2018 (WOA18) using the double gradient method along with the identity-index (i-index) showing the quality of the determination. With the identified (hT, hD) and (GT, GD), other parameters such as ITL heat content (HITL), mixed layer fresh-water content (FML), maximum thermocline gradient (GTmax), thermocline depth (hth), temperature at thermocline depth (Tth), maximum density gradient GDmax), pycnocline depth (hpyc), density at pycnocline depth (ρpyc), and (hT − hD) (barrier layer if positive or compensated layer if negative). The dataset is located at the NOAA/NCEI website (10.25921/j3v2-jy50). It provides useful background information for ocean mixed layer dynamics, air-sea interaction, climatological studies. This paper is the only document for the dataset.

( ) 0 for (in ITL), ( ) 0 for (in ML) ( )or ( )evident for (in thermocline) or ( in pycnocline) (1) Two types of methodology, single near-zero gradient and double gradients, are available to determine h T and h D from vertical T and ρ profiles. The single near-zero gradient method requires either the deviation of T (or ρ) from its value near the surface (i.e., reference level) to be smaller than a certain fixed value, such as 0.8 °C 9 , 0.2 °C 15,16 to 0.1 °C 17 for T, (0.03 kg/m 3 ) 15 , (0.125 kg/m 3 ) 18 , to (0.05 kg/m 3 ) 19 for ρ, or the near-zero vertical gradient to be smaller than a certain fixed value, such as (0.025 °C/m) 6,10 , (0.02 °C/m) 20 , to (0.015 °C/m) 21 for G T (z k ). To eliminate or reduce such uncertainty, the split-and-merge (SM) 22 , maximum curvature [23][24][25] , optimal linear fitting 26 , and maximum angle 27 methods have been developed. The double gradient method is based on the transition of a near-zero gradient in the ITL (or ML) to an evident gradient in the thermocline (or pycnocline). The double gradient method with the exponential leap-forward gradient (ELG) 28,29 was used in this study.
After (h T , h D ) are determined from individual (T, ρ) profiles, the barrier (or compensated) layer depth, ocean heat content (OHC) for ITL (H ITL ) 30 , freshwater content (FWC) for ML (F ML ) were calculated. Note that H ITL (or F ML ) is the vertical integration of temperature (salinity) profile from the surface (z = 0) down to the base of the ITL (z = −h T ) [or ML (z = −h D )] rather than to fixed depths such as (H 700 , F 700 ) for the upper 700 m. Obviously, H ITL and F ML are new and different from traditionally defined OHC (or FWC) in the oceanographic community with fixed depth intervals from the ocean surface such as 0-150 m (in the Indian Ocean) 31 , 0-300 m 32 , 0-400 m 33 , 0-700 m 34 , 0-750 m 35 , 0-2000 m 36 , deep layer OHC such as below 2000 m 37 , and the full layer OHC 38 . Interested readers are referred to two excellent review papers 39,40 .
From vertical gradient data below ITL (or ML) G T (z k ), z k < −h T [G D (z k ), z k < −h D ], other thermohaline parameters can be identified from each T-profile (ρ-profile), such as maximum temperature (density) gradient, thermocline (pycnocline) depth, temperature (density) at thermocline (pycnocline) depth. The quality of the (h T , h D ) data are estimated by the identification index 29 (i-index, see Error Estimation Section). Altogether, this dataset, containing 17 thermohaline parameters with i-index, is located at the NCEI website (Data Citation 1) for public use.
Global climatological (annual mean and monthly mean) data of ocean thermohaline parameters can be established through two approaches: (1) analysing climatological (T, S) profiles such as earlier version of WOA18 to obtain climatological (h T , h D ) data 5,9,13,41 , and (2) analysing observational profiles to get synoptic thermohaline parameters 29,30 and then using optimal interpolation 42 , Kalmen filter 43 , or optimal spectral decomposition 44 to produce gridded climatological thermohaline parameters. At present, we cannot estimate how big the difference is if using these two approaches. We can estimate only after the two approaches have been used. In this study, we take the first approach to derive climatological thermohaline parameter data from the NOAA/ NCEI World Ocean Atlas 2018 (WOA18) 45 annual and monthly mean temperature and salinity (S) profiles with regular 102 vertical levels (Table 1).
Furthermore, on the base of the single gradient method (i.e., near-zero gradient in the mixed layer), serval global climatological datasets of ocean (h T , h D ) were produced from the earlier version of WOA18 such as the NOAA/NCEI Mixed Layer Depth Data 5 , the Naval Research Laboratory Mixed Layer Depth (NMLD) Climatologies 41 , and comprehensive Mixed Layer Data for the Indian Ocean 13 . All these datasets don't include any parameters below the ITL/ML. On the base of the double-gradient method, this dataset 45 contains more thermohaline parameters from WOA18 such as thermocline gradient (G T ), pycnocline gradient (G D ), ITL heat content (H ITL ), mixed layer fresh-water content (F ML ), maximum thermocline gradient (G Tmax ), thermocline depth (h th ), temperature at thermocline depth (T th ), maximum density gradient G Dmax ), pycnocline depth (h pyc ), density at pycnocline depth (ρ pyc ), and (h T -h D ) (barrier layer if positive or compensated layer if negative).   www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/ Among them, ITL heat content (H ITL ), and mixed layer fresh-water content (F ML ) are different from the commonly used fixed-depth heat and freshwater contents.

Methods
These methods can be found in our related work 28,29 . Main part of pycnocline (or Thermocline). The ρ-profile [ρ(z k )] is taken for illustration. Let the depths corresponding to ρ min and ρ max be z 1 and z K . The vertical density difference, Δρ = ρ max − ρ min , represents the total variability. Theoretically, the variability is 0 in ML and large in pycnocline beneath the ML. It is reasonable to identify the main part of the pycnocline between the two depths: z (0.1) and z (0.7) , with the difference to ρ min as 0.1Δρ and 0.7Δρ (Fig. 2), respectively. Here, ρ(z (0.1) ) = ρ min + 0.1Δρ, and ρ(z (0.7) ) = ρ min + 0.7Δρ. pycnocline and thermocline gradients. The data between z (0.1) and ]. Vertical gradients are calculated between ρ i (i = 1, 2, …, I) and ρ 0 , is used to represent the characteristic gradient for the pycnocline, and it is simply called the pycnocline gradient (G D ). Similarly, the same procedure id used to obtain the thermocline gradient (G T ). The annual mean (G T , G D ) maps www.nature.com/scientificdata www.nature.com/scientificdata/ show gradients are stronger in tropical regions (20°S-20°N) than middle and high latitudes. In the low latitudes, the gradients are stronger in the eastern than western Pacific and Atlantic. The January and July mean (G T , G D ) maps show stronger seasonal variability in the northern hemisphere than in the southern hemisphere (Fig. 3). Fig. 2. Let the number of the data points between z 1 and z (0.7) be N g , and let N = [log 2 (N g )] with the bracket indicating the integer part of the real number inside. N is much smaller than N g . Starting from z 1 , the (N + 1) exponential leap-forward gradients (ELGs) are calculated at depth z k [between z 1 and z (0.7) ] (Fig. 4) www.nature.com/scientificdata www.nature.com/scientificdata/

ELG for determining MLD (ILD). Go back to the original profile shown in
which represents the gradient effectively at the depth z k with capability to filter out noises in the gradient calculation 28,29 . Since pyc if z k in the pycnocline, it is reasonable to use (an order of smaller gradient in mixed layer than in pycnocline),   Figure 7 shows the annual, January, July mean (T ITL , ρ ML ). Note that (T ITL , ρ ML ) are not the same as the sea surface temperature and density. The variables (T ITL , ρ ML ) follow the ocean mixed layer dynamics. For example, T ITL satisfies following equation due to the ITL layer heat balance 46 www.nature.com/scientificdata www.nature.com/scientificdata/ where V is the vertically averaged horizontal velocity from the surface to −h T ; V and T are deviation from the vertical average; Q 0 is the net surface heat flux adjusted for the penetration of light below the ITL; − Q h T is the vertical turbulent diffusion at the base of ITL; w e is the entrainment velocity; Λ is the Heaviside unit function taking 0 if w e < 0, and 1 otherwise due to the second law of the thermodynamics; ΔT is the temperature difference between ITL and thermocline; T th is the thermocline temperature.
ITL heat content and ML freshwater content. The ITL heat content is the integrated heat stored in the layer from the sea surface (z = 0) down to ILD (z = − h T ), where c p = 3,985 J kg −1 °C −1 , is the specific heat for sea water. The annual, January, and July mean H ITL show that H ITL is higher in low latitudes than in middle and high latitudes and is higher in the western than eastern Pacific and Atlantic Oceans within the low latitudes (left panels in Fig. 8  www.nature.com/scientificdata www.nature.com/scientificdata/ The ML freshwater content (F ML ) is the integrated heat stored in the layer from the sea surface (z = 0) down to MLD (z = − h D ) 48 , The annual mean F ML is mostly positive in the North Pacific Ocean, the Southern Ocean, the Indian Ocean except the Arabian Sea, and mostly negative in the Atlantic Oceans. Four evident negative F ML areas are in the Arabian Sea, central and eastern South Pacific Ocean between equator to 25°S, the North Atlantic Ocean from the Caribbean Sea to the west coast of Spain, and the South Atlantic Ocean between the equator to 30°S (upper right panel in Fig. 8).
Seasonal variability is evident with stronger positive F ML in the North Pacific Ocean, weaker negative F ML in the South Pacific Ocean, stronger negative F ML in the North Atlantic Ocean, weaker negative F ML in the South Atlantic Ocean in January than in July (central and lower right panels in Fig. 8). Strongest positive F ML (>5 m) occurs in the northern (35° N-60° N) North Pacific Ocean in January. Strongest negative F ML (<−5 m) occurs in the central to eastern subtropical North Atlantic Ocean in January, and in the eastern subtropical South Pacific Ocean and the subtropical South Atlantic Ocean in July. Note that the ML freshwater content (F ML ) dataset is also new since it is different from the fixed-depth freshwater content.
Thermocline and pycnocline. Maximum gradients can be obtained from vertical gradients of (T, ρ) between z (0.1) and z (0.7) calculated by Eq. (2), www.nature.com/scientificdata www.nature.com/scientificdata/ which are used to represent the depths of thermocline (h th ) and pycnocline (h pyc ) The temperature at the thermocline depth is defined as the thermocline temperature The density at the pycnocline depth is defined as the pycnocline density Annual mean G Tmax and G Dmax have evident latitudinal variability with large values in the tropical regions and small values outside the tropical region (upper two panels in Fig. 9). In the tropical Pacific, Atlantic, and Indian Oceans, the large values are in the central and eastern parts. Seasonal variability of (G Tmax , G Dmax ) is strong in the Northern Hemisphere and weak in the Southern Hemisphere. In the Northern Hemisphere, the difference is small between annual and January mean (G Tmax , G Dmax ) but large between annual and July mean (G Tmax , G Dmax ). In July, very strong (G Tmax , G Dmax ) occur in the northeastern Asian and American coastal regions, and strong (G Tmax , G Dmax ) appear in the Kuroshio extension and Gulf Stream regions (middle and lower panels in Fig. 9).
Annual mean h th and h pyc have evident spatial variability with large values in low latitudes (30°S-30°N) with shallowing of (h th , h pyc ) in the eastern Pacific and Atlantic Oceans (upper two panels in Fig. 10). Seasonal variability of (h th , h pyc ) is strong in middle and high latitudes with deep (h th , h pyc ) in (30°N-60°N) in January and deep (h th , h pyc ) in (30°S-60°S) in July (middle and lower panels in Fig. 10).
Annual mean T th and ρ pyc have evident spatial variability with warm T th and light ρ pyc in low latitudes (30°S-30°N) and cold T th and dense ρ pyc in middle and high latitudes (30°S-60°S, 30°N-60°N) (upper two panels www.nature.com/scientificdata www.nature.com/scientificdata/ in Fig. 11). Seasonal variability of (T th , ρ pyc ) is generally weak except in subtropical oceans where warmer T th (light ρ pyc ) in the Sothern Hemisphere in January and in the Northern Hemisphere in July. Furthermore, cold T th and dense ρ pyc appears in North Atlantic Ocean near Greenland in January (middle and lower panels in Fig. 11).
Global statistics of the thermohaline parameters. The identified thermohaline parameters are on the grid points. We use the standard Matlab codes to calculate with area-weighted mean, standard deviation, skewness, and kurtosis for each parameter. Table 2 shows the statistical characteristics of thermohaline parameters derived from the annual mean (T, ρ) profiles. These values can be treated as the overall climatological values of global thermohaline parameters, such as 51.4 m for isothermal layer depth, 38.7 m for mixed layer depth, 14.3 m for barrier layer depth, 99.4 m the thermocline depth, 73.9 m for the pycnocline depth, 0.0638 °C/m for thermocline gradient, and 0.0212 kg/m 4 for pycnocline gradient.

Data records
This global ocean climatology of thermohaline parameter dataset is publicly available at the NOAA/NCEI data repository as a NetCDF file, which includes data citation, dataset identifiers, metadata, and ordering instructions. The dataset is located at the NOAA/NCEI website (https://doi.org/10.25921/j3v2-jy50).   Table 2. Statistical characteristics of thermohaline parameters derived from the annual mean (T, ρ) profiles. i.e., thermocline not existence. Here, the ITL depth is marked by a circle. The thick line in each figure is a single linear function fitted to the temperature profile data.

Fig. 13
Maps and histograms of the i-index to determine (h T , G T ) from the annual, January, July mean temperature profiles (left panels) and (h D , G D ) from the annual, January, July mean density profiles (right panels). Here, top panels are for the annual mean; middle panels are for the January mean; and lower panels are for the July mean.
www.nature.com/scientificdata www.nature.com/scientificdata/ The methodology is evaluated through the comparison between the fitted profiles [ ρ T z z ( ), ( ) k k ] and the WOA18 profiles [ ρ T z z ( ), ( ) k k ]. We take T-profile for illustration. The fitted T z ( ) k profile is represented by two lines with the first one (near zero gradient in the ITL) from the top to the ITL base (circles in Fig. 12) and the second one (non-zero gradient in the thermocline) from the ITL base to the bottom of the profile. The quality in determination of (h T , G T ) is identified by the sum of the square error (SSE) between the fitted T z ( ) k and WOA18 T(z k ) at the WOA18 vertical levels (see Table 1), k k k 2 Such double-gradient fitting has errors in the ITL represented by the sum of the square error in ITL (SSE ITL ) and in the thermocline represented by the sum of the square error in thermocline (SSE TH ). The whole T-profile data fitted to a single linear function (thick lines in Fig. 12) represents the maximum error since it disregards the existence of ITL and thermocline. Such a maximum error is represented by the total sum of the square error (SSE T ). The identification index (called i-index) is defined by 29

ITL I TL TH T
If an ITL exists but is not identified (h T = 0) (Fig. 12a), SSE ITL = 0, SSE TH = SSE T ; which gives I ITL = 0. If the identified h T is shorter than the real one (Fig. 12b), SSE ITL = 0, SSE TH > 0, SSE TH < SSE T ; which leads to 0 < I ITL < 1. If the identified h T is the same as the real one (Fig. 12c), SSE ITL = 0, SSE TH = 0, SSE T > 0; which makes I ITL = 1. If the identified h T is deeper than the real one (Fig. 12d), SSE ITL > 0, SSE TH = 0, SSE TH < SSE T ; which leads to 0 < I ITL < 1. If the identified h T reaches the bottom of the thermocline (Fig. 12e), SSE TH = 0, SSE ITL = SSE T ; which gives I ITL = 0.
Thus, the value of I ITL represents the quality of the determination of h T from T-profile: (a) I ITL = 1 for no error (Fig. 12c), (b) I ITL = 0 for 100% error with no ITL identified but actual existence of ITL (Fig. 12a) or identified h T at the bottom of the thermocline (Fig. 12e,c) 0 < I ITL < 1 for identified h T shallower than the actual h T (Fig. 12b) and for identified h T deeper than the actual h T (Fig. 12d).
The annual, January, July mean maps and histograms of the i-index for ILD show high quality of the method. The global average i-index is 0.849 for the annual mean, 0.865 for the January mean, and 0.882 for the July mean (left panels in Fig. 13). The annual, January, July mean maps and histograms of the i-index for MLD show quality of the method. The global average i-index is 0.841 for the annual mean, 0.858 for the January mean, and 0.865 for the July mean (right panels in Fig. 13).

Code availability
The MATLAB codes to determine these thermohaline parameters from WOD18 (T, S) were published in the two related papers 28,29 , and can be obtained at https://doi.org/10.1007/s10872-017-0418-0. The main program is Thermohaline.m, along with four Matlab functions: validate.m, getgradient.m, ELGCore.m, and Iindex.m (see the Technical Validation Section). The function validate.m is employed to identify if (T, ρ) profiles having double gradient structure. The function 'getgradient.m' is used to calculate the vertical gradient. The function ELGCore.m is used to get (h T , G T ) or (h D , G D ) from T-profile or ρ-profile. The function Iindex.m is used to calculate the iindex (I ITL ) for the error estimation. Then, the code 'Thermohaline.m' generates other thermohaline parameters such as ITL heat content, mixed layer fresh-water content, maximum thermocline gradient, thermocline depth, temperature at thermocline depth, maximum density gradient, pycnocline depth, density at pycnocline depth, barrier layer depth, and compensated layer depth. Since the calculation is local for individual (T, S) profile pair, interested readers may use our MATLAB codes to analyse any (T, S) profiles to get the derived thermohaline data with quality identification (i.e., i-index).